Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | في | 26 | ذلك |
2 | من | 27 | العام |
3 | على | 28 | كل |
4 | أن | 29 | وفي |
5 | إلى | 30 | رئيس |
6 | التي | 31 | الرئيس |
7 | عن | 32 | قبل |
8 | مع | 33 | هو |
9 | الذي | 34 | عام |
10 | ان | 35 | المتحدة |
11 | ما | 36 | حيث |
12 | هذا | 37 | غير |
13 | إن | 38 | منذ |
14 | لا | 39 | بن |
15 | هذه | 40 | أنه |
16 | الى | 41 | السوري |
17 | بعد | 42 | أي |
18 | بين | 43 | كانت |
19 | وقال | 44 | عدد |
20 | خلال | 45 | وقد |
21 | أو | 46 | ، |
22 | كما | 47 | حتى |
23 | لم | 48 | وكان |
24 | قد | 49 | الحكومة |
25 | كان | 50 | بعض |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges